Playing with the kappa metric

Published on 2024-07-29

For the past week, I've been playing with Nassim Taleb's $\kappa$ -metric. Given how a lot of real-world data doesn't actually fit a "standard" normal distribution, playing with this $\kappa$ -metric is part of a bigger exploration of mine. Namely, I'm exploring the different ways to measure what distribution a dataset best fits, whether that's the normal distribution, and other related preasymptotic behavior.

The $\kappa$ -metric

The $\kappa$ -metric measures a distribution's rate of convergence to a Lévy $\alpha$ -Stable basin (Normal distribution being one) as the number $n$ of independent and identically distributed (i.i.d) random variables $n$ increases. In short, it tells you how much data you'd need for the data to become normally distributed.

I'll just cut to the chase with the following formula:

\kappa(n_0, n) = 2 - \frac{log(n)-log(n_0)}{log(\frac{\mathbb{M}(n)}{\mathbb{M}(n_0)})}. (1)

Further,

\mathbb{M}(n)

is the mean absolute deviation from the mean for

n

summands. It's calculated as follows:

\mathbb{M}(n) = \mathbb{E}[|S_n - \mathbb{E}[S_n]|]. \text{ (2)}

If you're not familiar,

\mathbb{E}[S_n]

is the expected value (mean) of the

n

summands

S_n

What's the $\kappa$ -metric for Lognormal distribution?

Lognormal distributions are often mentioned with data related to things like digital marketing data, economic data, and other datasets related to somewhat scalable, complex systems. So going over the $\kappa$ -metric related to this type of distribution is useful.

For the Lognormal distribution, a closed-form of the $\kappa$ -metric for $n=2$ is approximately the following:

\kappa_1 \thickapprox 2-\frac{log(2)}{log(\frac{2\text{erf}(\frac{\sqrt{log(\frac{1}{2}(e^{\rho^{2}}+1))}}{2\sqrt{2}})}{\text{erf}(\frac{\rho}{2\sqrt{2}})})}. (3)

Also, $\text{erf}(.)$ is the following error function:

\text{erf}(z)=\frac{2}{\sqrt{\pi}}\int_{0}^{z} e^{-t^2} , dt. (4)

Running some quick code (link below) that uses expression $(3)$ to calculate $\kappa_1$ for $\rho$ values between 0 and 3 for the lognormal distribution, you'll get the same plot shown in Figure 1.

Figure 1: Plot of the lognormal distribution's κ-metric as σ increases.

While it's great there's a closed-form approximation of $\kappa_1$ for lognormal distributions, this isn't the case for all distributions. Often, it is very challenging (if not impossible) to find such closed-form expressions for different distributions. When this happens, you have to directly generate the n random variables, calculate $S_n$ , then $\mathbb{M}(n)$ , and finally $\kappa(n_0, n)$ .

Running code (link also below) that runs such calculations to get $\kappa_1$ for $\rho$ values between 0 and 3 for the lognormal distribution, you'll get a plot similar to Figure 2.

Figure 2: Plot of the lognormal distribution's κ-metric as σ increases using numerical calculations.

I may explore other distributions and their $\kappa$ -metric values across varying parameter values just to see the different effects. For now, this is a good start, though.

Link to Code

The code used for calculating expressions and their results in this note can be found here.

Reference

The following are important references used and/or related to this note:

How Much Data Do You Need? A Pre-asymptotic Metric for Fat-tailedness

Timothy Rollings